Improving Performance of Classification Models with Textual Data

نویسندگان

  • Donghui Shi
  • Jian Guan
  • Jozef M. Zurada
چکیده

The main objective in this study is to measure the effect of unstructured text on classification performance. A large dataset of aviation incidents reports was used in this study. In aviation incidents the proportion attributable to human factors is close to 90%. Therefore accurate identification of the presence of human factors in past aviation incidents is critical to improving aviation safety. Most existing research for identifying the causes of aviation incidents are based on manual work by experts and often rely on just structured data. In this study, a text mining approach was used to extract topics from the text narratives in incident reports to detect the presence of human factors in the incidents. These topics, along with structured data in the incident reports, were then used to build classifiers using three different models: memory-based reasoning (MBR), support vector machines (SVM), and decision trees (DT). The preliminary results from the experiments show that the best model in terms of the overall classification rate is the DT model with textual and structured data. The best model in terms of sensitivity is SVM with textual and structured data. These results show that the addition of textual data can potentially improve classification performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Improving the Performance of Myocardial Infarction Prediction

Abstract Introduction: Myocardial Infarction, also known as heart attack, normally occurs due to such causes as smoking, family history, diabetes, and so on. It is recognized as one of the leading causes of death in the world. Therefore, the present study aimed to evaluate the performance of classification models in order to predict Myocardial Infarction, using a feature selection method tha...

متن کامل

Improving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran

An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...

متن کامل

Providing a New Model to Improving DEA-based Models in Multi-criteria Inventory Classification (Case Study: Pars Khazar)

Abstract Objective: Many organizations use the ABC classification method to control their large amount of inventories. The most common way to classify inventories is the ABC method. In traditional ABC classification, items are only classified according to one criteria. But there are other criteria that need to be considered in the inventory classification. The purpose of this study is to prese...

متن کامل

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

How Effectiveness Of Comprehensive Performance Measurement Systems on Manager's Performance Through Modification of Mental Models (Learning Process)

One of the ways to reduce agency costs is to plan for the creation of effective decision-making information by designing appropriate comprehensive performance evaluation systems according to managers' learning process One of the important factors in the processing and classification of information for cognitive learning is mental models that are categorized in two dimensions of mental model co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014